An analytically tractable model of neural population activity in the presence of 
common input explains higher-order correlations and entropy 



Jakob H Macke/'Q Manfred Opper, 2 and Matthias Bethge 3 

1 Gatsby Computational Neuroscience Unit, University College London and University of Tubingen, Germany 
Artificial Intelligence Group, Technical University Berlin, Germany 
3 Werner Reichhardt Centre for Integrative Neuroscience, 
Bernstein Centre for Computational Neuroscience, 
MPI for Biological Cybernetics and University of Tubingen, Germany 
(Dated: September 20, 2010) 

Simultaneously recorded neurons exhibit correlations whose underlying causes are not known. 
Here, we use a population of threshold neurons receiving correlated inputs to model neural popu- 
lation recordings. We show analytically that small changes in second-order correlations can lead to 
large changes in higher correlations, and that these higher-order correlations have a strong impact 
on the entropy, sparsity and statistical heat capacity of the population. Remarkably, our findings for 
this simple model may explain a couple of surprising effects recently observed in neural population 
recordings. 
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Finding models for capturing the statistical structure 
of firing patterns distributed across multiple neurons is 
a major challenge in sensory neuroscience. Recently, the 
Ising model 1 , originally introduced to understand fer- 
romagnetism, has become popular for studying neural 
population recordings [2H1] ■ The use of the Ising model 
for neural data analysis originates from the fact that it 
constitutes the optimum with respect to the maximum 
entropy (MaxEnt) rationale [5], and thus that devia- 
tions from the model are diagnostic of higher-order in- 
teractions, often referred to as higher-order correlations 
('hocs')[6j. It has been argued that hoes in spike trains 
play a critical role for the underlying population code. 
They have been shown to be stimulus- and scale depen- 
dent, and to affect the sparsity of the population response 
[3]. Studies using MaxEnt models have also raised the 
question of how the joint entropy [7] and the statis- 
tical heat capacity [8] of neural populations or natural 
stimuli [5] scale with population size. 

Here, we provide a parsimonious, tractable population 
model which can account for this multitude of empiri- 
cal observations. We study the effect of hoes in a phe- 
nomenological population model with neurons receiving 
common input. In our model, correlations between bi- 
nary neurons are thought to arise from common Gaussian 
inputs into threshold neurons, and it is thus equivalent 
to the Dichotomized Gaussian distribution (DG) [101 II 1| . 
We show that the statistical properties of the model could 
provide an explanation for some recent experimental ob- 
servations in population recordings. Importantly, we find 
that magnitude of hoes in the DG is strongly modulated 
by pairwise correlations, and in a manner which is consis- 
tent with neural recordings. In addition, we investigate 
the asymptotic scaling of the entropy in the DG and Max- 
Ent models, and show the impact of hoes on the sparsity 
of the population. Finally, we find that the specific heat 



of a population is strongly affected by hoes: It diverges 
with population size for models with all-to-all correla- 
tions beyond second order, and therefore any such model 
will have have a critical point at unit-temperature. 

The Dichotomized Gaussian is a model of correlated 
input. We model a population of n binary neurons Xj, 
where a neuron is said to spike (X; = 1) if its input 
is positive, and to be silent (X = 0) otherwise. The 
inputs are modelled by a correlated Gaussian with mean 
7 and covariance A. For the outputs X to have mean fi 
and covariance X, we choose 7 and A such that An = 1, 

Mi = Hli) an d £<j = $2(7i,7j> A «) ~ $(7i)$(7i)> where 
$(.) is the cumulative distribution function (cdf) of a 
univariate Gaussian, and $2(-i -i A) the cdf of a bivariate 
Gaussian with correlation coefficient A. The equations 
above have a unique solution for any admissible moments, 
and can be solved numerically [TT]. In the special case 
of ^ = fij = 1/2, Aij = sin(27rSy). Fig. [I] a shows 
that, for fixed input correlation and firing probability, 
there is a characteristic relationship between correlations 
and firing probabilities which is similar to that found 
in neural recordings |12j . For analytical tractability, we 
here focus on homogeneous populations, i. e. fa = fi 
and Eij = a, = A V(i ^ 3) [U EC2 Ej. We define 
the pairwise correlation coefficient p = a/(/j,(l — fJ,)). By 
symmetry, all patterns x with the same number of spikes 
are equally likely, and thus the model is fully specified by 
the distribution over spike counts K = J7- Xj. 

The effect of hoes is modulated by pairwise correlations. 
We want to determine how much additional redundancy 
between neurons is induced by the hoes of the correlated 
input model. We define Sdg to be the entropy of the 
full model, S q of the MaxEnt model with interactions of 
order q, as well as A2 = Si — S2 and A^ oc = S2 — Sdg to 
be the reduction in entropy due to second- and higher- 
order correlations. Importantly, A^ oc corresponds to 
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the Kullback-Leilber (KL) divergence, i.e. the expected 
log-likelihood ratio per sample between a model and its 
second-order approximation |15j . a popular measure of 
the magnitude of hoes in neural recordings [21 [3] . 

Figure [T] b shows A^ oc for a population model of size 

11 = 5. Notably, small changes in firing probabilities and 
pairwise correlations can result in large changes in A/ loc . 
For example, a change of correlation coefficient from 0.05 
to 0.1 for p = 0.1 leads to an increase of Ah oc by a factor 
of 10.3 (from 6.6 to 68- 10~ 5 ). This constitutes a possible 
quantitative explanation for the interesting phenomenon 
that hoes are much more pronounced amongst nearby 
cortical neurons [3J, for which also pairwise correlations 
are expected to be higher. It is also consistent with the 
finding that A^ oc is small in retinal recordings with weak 
correlations [21 [7] . Similarly, the 'multi- information ex- 
plained' [2] I 2 = A 2 /(A 2 + A hoc ) of a DG is large, e.g. 

1 2 = 0.987 for p = p = 0.1 [7]. 

We also find that the strain [16] of the DG-model, 
a measure of how much more likely a spike-triplet is 
as a consequence of third-order correlations, is negative 
(—0.04 for fj, = p = 0.1, using log 2 ), and decreases with 
increasing correlation coefficients (Fig. [I] d). This is 
consistent with experimental observations [16j and sur- 
prising, as it has been sugested that a common-input 
model would have a higher occurrence of spike-triplets, 
and thus have positive strain which increases with cor- 
relations |16j . Further simulations with heterogeneous 
correlations in the DG show that its strain is usually neg- 
ative when all three pairwise correlations have the same 
sign. Thus, these statistical properties of our common 
input model are consistent with those observed in small 
neural populations. 

For large populations, A 2 and A^oc scale linearly with 
population size. We are interested in the scaling of the 
entropies of the two models with population size. For the 
DG, the asymptotic probability density of the normalized 
counts R = K/n, which we denote by f(r), r £ (0, 1) is 
given by [15] : 



fDG(r) = 



1 



exp - 



($- x (r) 



(1-2A) 
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j dg * A/(l - 2A) 

We can calculate the asymptotic entropy rate of the 
DG, sdg = lim^ooScG/n by decomposing it into 
the entropy of the spike count and the entropy condi- 
tional on the spike count, S(X) = S{X\K) + S(K). 
We note that S(K) is bounded above by log 2 n, and 
that S(X\K = k) = log 2 (£). Using the identity 
log 2 (£)/n -> -(r log a (r) + (1 - r) log 2 (l - r)) =: r? 2 (r), 
we can see that entropy in this model with all-to-all cor- 
relations is extensive, i.e. does not saturate, but rather 
scales linearly with population size for large n [21 [7] with 
rate s DG = J Q fDG{r)n 2 {r)dr . 

We calculate the maximal entropy for large n by find- 
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FIG. 1: Correlations in the DG a) Correlations increase with 
firing probability /i for constant input correlation A. b) KL 
divergence Ahoc depends on mean firing rate /i and correlation 
p in a population of size n — 5. c) For n = 5, the multi- 
information explained (72) by a DG is very large, d) The 
strain of the homogeneous DG is negative and correlation- 
dependent, e) Asymptotically, I2 between the models can be 
very low for small correlations, f) Scaling of the entropies 
of MaxEnt/DG as a function of population size n for mean 
(i — 0.1, and comparison with asymptotic rates. The entropy 
per neuron drops initially before settling to the asymptotic 
value. For weak correlations, differences between models only 
become substantial for large n. 



ing the spike count distribution Pi S i{k) which maximizes 
H{X\K). The solution of this constrained linear op- 
timization problem is a mixture of two delta peaks, 
fisi(r) = P\5{r - ri) + p 2 S(r - r 2 ) with locations r\$ = 
1/2 ± \J 1/4 — ~p~~\~ p? ~\~ o~ [20] ■ Hence, the asymptotic 
entropy per neuron of the maximum entropy model is 
Sisi — rj 2 (ri). The entropy-rate of the DG for p = 0.1 
and p = 0.05 is 0.35, and the rate of A hoc = 0.016, and 
increases by a factor of 1.75 if correlations increase to 
0.1. For large populations, I 2 of the DG can be much 
lower, e.g. it is 0.57 for p = p = 0.1. Fig. [l]c also shows 
that the close similarity (as measured by I 2 ) between the 
MaxEnt-model and the DG conjectured by [TT] asymp- 
totically holds for firing probabilities near 0.5, but not 
necessarily otherwise. Our results readily generalize to 
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FIG. 2: Population spike count distributions and sparsity: 
a,b) The spike count distributions for the DG (a) and Ising 
model (b) for population size n — 100 and p = 0.1 (rescaled 
by population size n) are substantially different (large-n ap- 
proximations in gray, background). Note that p = 0.25 is the 
critical correlation, and that the Ising model is bimodal. c,d) 
For large n, the DG-population (c) is much sparser than the 
MaxEnt model (d), (same parameters as above). 

populations consisting of a finite number of homogeneous 
pools. In this case, the asymptotic scaling of entropy is 
dominated by the within-pool correlations. Furthermore, 
our results could be used to derive lower bounds on the 
entropy of general MaxEnt models. 

The hoes of the DG increase sparsity. In addition to 
the entropy, hoes also affect other population statistics. 
In particular, we are interested in their effect on the spar- 
sity of the population, which is considered to be an im- 
portant feature of population coding. We quantify spar- 
sity as the probability of the population being quiet [3J, 
i.e. P(K=0). It has been shown [3J that hoes in corti- 
cal networks lead to an increase in sparsity, and this is 
also consistent with the observation that MaxEnt models 
in the retina under-estimate the probability of quiescence 
[5J [S]. We have already derived the count distribution [T7] 
of the DG. From equation ([!]), we can see that the mode 
of f(r) is at 0, i.e. quiescence is the most likely popula- 
tion state whenever the input correlation A exceeds the 
value A = 0.5 (Fig. [2] a), which is a critical point for 
fDG( r )- Interestingly, this is independent of the parame- 
ter 7 controlling the mean firing rate (as long as 7 < 0). 
For small spike probabilities p, even small correlations p 
correspond to a super-critical A (Fig. [l]a). 

For the corresponding Max-Ent distribution, the bi- 
nary infinite range Ising model with P(K = k) = 
exp (h n k + J n k 2 ), we need to identify the scal- 



ing of the parameters h n and J n yielding the desired 
means and correlations. It should be noted that this 
limit is subtly, but critically different from the usual 
thermodynamic limit [TJ [5J Q2]: Scaling J n = J/n 
and h n — h yields a large-n distribution of f(r) ~ 
exp (n (i] e (r) + hr + Jr 2 )) /Z, which collapses to a sin- 
gle delta-peak. Thus, this approach leads to vanishing 
second-order correlations [13] which violate the moment 
constraints. We need to ensure (h + J) = a/n with 
a = (logp 2 — ^°gPi)/( r 2 — r i) to achieve correlations 
of order one, and this yields a large-n distribution of 

fisi(r) = Zr s \ exp (or + n (n e (r) + J(r 2 - r))) (2) 

with J = (log(r 2 ) - log(ri))/(r 2 - r x ). 

Figure [2] shows a comparison of the spike count dis- 
tributions of the two models for n = 100, and the scal- 
ings of the sparsities with population size [21]. We can 
see that the DG has increasing sparsity for super-critical 
correlation p — 0.25. The count distribution of the Max- 
Ent model is bimodal (corresponding to a ferromagnetic 
phase), behaves very much like a mixture of two inde- 
pendent distributions, and has vanishing sparsity. In 
fact, any model with interactions of finite order q will 
asymptotically behave like a mixture of at most q inde- 
pendent distributions [13] , and exhibit similar sparsity 
scaling. Thus, correlations of all orders are necessary 
for achieving a continuous asymptotic spike count distri- 
bution, and the same sparsity scaling as the DG. These 
results were derived assuming that all neurons have iden- 
tical firing rates and correlations. If the population is 
heterogeneous, there could be additional sparsity aris- 
ing, e.g., from neurons with low firing rates. However, 
we conjecture that sparsity in larger populations is still 
strongly affected by hoes. 

Hoes increase heat capacity. Finally, we investigate 
the impact of hoes on the heat capacity of the popula- 
tion. As the heat capacity is proportional to the vari- 
ance of log-probabilities of population states, examining 
it can give insights into coding properties of the popu- 
lation [8]. Furthermore, a sharply peaked and diverging 
specific heat (i.e. heat capacity normalized by popula- 
tion size) is evidence for a physical system being at a 
critical point [TJ |H]. The distribution of a model P(x) 
at temperature T = 1/(3 is given by Pp(x) = P(x)P/Zp, 
and the specific heat by c = Varlog 2 Pp{x)/n. For large 
n, the spike count distribution is Pp{K) = exp(n(l — 
f3)r] e (k/n))P(K) 13 /Z, and asymptotically this yields 

Cj3 = n J fp(r) (ri2{r) 2 — s 2 ^j dr, where fp is the limit- 
ing distribution of Pp(K). 

Therefore, cp diverges linearly whenever this integral 
is non-zero, which is the case for the DG and many other 
models at /3 = 1. For j3 7^ 1, however, fp(r) is dominated 
by the exponential, collapses to a delta-peak, and has fi- 
nite specific heat. Thus, the DG has a critical point at 
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FIG. 3: Scaling of specific heat: a) Specific heat of the DG 
(for mean u = 0.1 and p = 0.1) diverges at T = 1. Inset: Spe- 
cific heat of the DG at T — 1 grows linearly with population 
size. b,c) Specific heats for p, = 0.1 and T = 1 vary with cor- 
relation p for DG (b) and Ising model (c). (gray: asymptotic 
heat, rescaled by 100 for DG). For large n, the Ising model 
attains it maximum at values close to 0. 



T = 1 (Fig. [3] a). This behaviour is independent of the 
originally observed moments, and therefore true for al- 
most any such system. The second-order MaxEnt model 
is a notable exception, in that its fp consists of two sym- 
metric delta-peaks even at T = 1, and that its specific 
heat is, in general, finite for each temperature (Fig. [3] 
a inset). Further simulations with heterogeneous all-to- 
all correlations suggest that the specific heat of the DG 
(but, in general, not of the Ising model) grows linearly in 
n at unit temperature. 

It is therefore informative to calculate the specific heat 
at unit temperature as a function of the moments \x and 
p. In this case, the specific heat of the Ising model is 



r 1 r 2 J 2 (cr + ^ 2 - /i + 1/4) 
4 (1 - 2Jrira) 



log|(e). 



(3) 



Asymptotically, the heat capacity of the MaxEnt model 
is maximized for vanishing correlation, whereas the DG 
attains its maximum at strong correlations, e.g. p = 0.37 
for fx = 0.1 (Fig. |3]b,c). We conclude that hoes can have 
a substantial impact on the specific heat: They lead to 
a qualitatively different scaling behaviour, and strongly 
influence the moments which maximize it. 



Conclusions We showed that a simple binary model 
with common inputs could qualitatively account for a 
variety of empirical observations, including hoes which 
depend on second-order correlations, a negative strain, 
increased sparsity and a divergent specific heat. It is 
worth remarking that all of our formulations can read- 
ily be generalized to more general input distributions or 
spike generation mechanisms. Further investigations will 
have to show whether our results would also quantita- 
tively account for these observations, and how they can 
be rigorously extended to heterogeneous and temporal 
correlations [TS]. 
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This distribution can be derived using the method of 
steepest descent [T3] or by finding the likelihood of an 
input which has probability r of inducing a spike. 
This solution can be verified using the Karush-Kuhn- 
Tucker conditions. This approach can also be used to 
calculate the minimum-entropy distribution. 
We assume p > 0, and r > 0, for very weak correlations, 
other expansions might be more accurate |17| . 



